Comparison of AM-FM based features for robust speech recognition
نویسندگان
چکیده
Effective feature extraction for robust speech recognition is a widely addressed topic and currently there is much effort to invoke non-stationary signal models instead of quasi-stationary signal models leading to standard features such as LPC or MFCC. Joint amplitude modulation and frequency modulation (AM-FM) is a classical non-parametric approach to nonstationary signal modeling and recently new feature sets for automatic speech recognition (ASR) have been derived based on a multi-band AM-FM representation of the signal. We consider several of these representations and compare their performances for robust speech recognition in noise, using the AURORA-2 database. We show that FEPSTRUM representation proposed is more effective than others. We also propose an improvement to FEPSTRUM based on the Teager energy operator (TEO) and show that it can selectively outperform even FEPSTRUM.
منابع مشابه
Smoothed Nonlinear Energy Operator-Based Amplitude Modulation Features for Robust Speech Recognition
In this paper we present a robust feature extractor that includes the use of a smoothed nonlinear energy operator (SNEO)-based amplitude modulation features for a large vocabulary continuous speech recognition (LVCSR) task. SNEO estimates the energy required to produce the AM-FM signal, and then the estimated energy is separated into its amplitude and frequency components using an energy separa...
متن کاملAn Information-Theoretic Discussion of Convolutional Bottleneck Features for Robust Speech Recognition
Convolutional Neural Networks (CNNs) have been shown their performance in speech recognition systems for extracting features, and also acoustic modeling. In addition, CNNs have been used for robust speech recognition and competitive results have been reported. Convolutive Bottleneck Network (CBN) is a kind of CNNs which has a bottleneck layer among its fully connected layers. The bottleneck fea...
متن کاملContinuous-time models for AM-FM signal demodulation and their application to speech recognition
Automatic speech recognition (ASR) systems can benefit from including into their acoustic processing part new features that account for various nonlinear and time-varying phenomena during speech production. In this paper, we develop robust continuoustime expansions used to demodulate the instantaneous amplitudes and frequencies of the speech resonances and extract novel acoustic features from s...
متن کاملAM-FM Based Robust Speaker Identification in Babble Noise
Speech babble is one of the most challenging noise interference due to its speaker/speech like characteristics for speech and speaker recognition systems. Performance of such systems strongly degrades in the presence of background noise, like the babble noise. Existing techniques solve this problem by additional processing of speech signal to remove noise. In contrast to existing works, the aim...
متن کاملA comparative study on AM and FM features
In this paper, we investigate the advantages of frequency modulation (FM) features by conducting speech recognition experiments and statistical analysis. The importance of temporal aspects in speech recognition has been discussed along with the importance of amplitude modulation (AM) and frequency modulation. Recently, we have proposed a speech recognition system that is based on the combinatio...
متن کامل